Reverse Code Engineering RCE CD +sandman 2000

home *** CD-ROM | disk | FTP | other *** search

/ Reverse Code Engineering RCE CD +sandman 2000 / ReverseCodeEngineeringRceCdsandman2000.iso / RCE / Library / Manuels & Misc / Assembly / ASM-TUT.ZIP / CHAP15-1.DOC < prev next >

Wrap

Text File | 1990-07-19 | 28.8 KB | 705 lines

137 CHAPTER 15 - SUBROUTINES It is now time to talk about subroutines. If you have only used BASIC this may be difficult for you. It is assumed that you are familiar with subroutines and use them constantly in your programming. You have been using subroutines since the very first program in this manual. When you wrote: call get_num you called a subroutine in ASMHELP.OBJ. Now you are going to write subroutines yourself and have them call each other. There are different template files for programs with subroutines. They are SUBTEMP1.ASM and SUBTEMP2.ASM. We will start with SUBTEMP1. It has the entry subroutine and a space for additional subroutines. The entry subroutine is the subroutine where the operating system starts the program; it does the necessary initialization and has special code for that. You will see some additions to the normal template file. At the top is the line: INCLUDE \pushregs.mac What this is will be explained later, but you must put the file PUSHREGS.MAC in the root directory of your current drive. You will find it in the \TEMPLATE subdirectory. At the end of the SUBTEMP1.ASM is: ; + + + + + + + + + + + + START SUBROUTINES BELOW THIS LINE ; + + + + + + + + + + + + END SUBROUTINES ABOVE THIS LINE This is where you will write all the subroutines except the entry subroutine which is still the same as before. All data for all subroutines still goes in the DATASTUFF segment. Our first program will just call subroutines which will print out messages. Using SUBTEMP1.ASM, it looks like this: ;prog1.asm ; + + + + + + + + + + + + + + + START DATA BELOW THIS LINE main_message db "This is the entry routine.", 0 sub1_message db "This is subroutine1.", 0 sub2_message db "This is subroutine2.", 0 sub3_message db "This is subroutine 3.", 0 ; + + + + + + + + + + + + + + + END DATA ABOVE THIS LINE ______________________ The PC Assembler Tutor - Copyright (C) 1989 Chuck Nelson The PC Assembler Tutor 138 ______________________ ; + + + + + + + + + + + + + + + START CODE BELOW THIS LINE mov ax, offset main_message call print_string call sub1 mov ax, offset main_message call print_string ; + + + + + + + + + + + + + + + END CODE ABOVE THIS LINE ; + + + + + + + + + + + + START SUBROUTINES BELOW THIS LINE ;------------ sub1 proc near push ax mov ax, offset sub1_message call print_string call sub2 mov ax, offset sub1_message call print_string pop ax ret sub1 endp ;------------ sub2 proc near push ax mov ax, offset sub2_message call print_string call sub3 mov ax, offset sub2_message call print_string pop ax ret sub2 endp ;------------ sub3 proc near push ax mov ax, offset sub3_message call print_string pop ax ret sub3 endp ; ---------- ; + + + + + + + + + + + + END SUBROUTINES ABOVE THIS LINE The data consists of messages to be printed by print_string. Print_string prints a zero terminated string (the number zero, not the character '0'), so there must be a zero after each message in the data segment. The entry subroutine prints a message and then calls sub1, the first subroutine, which prints a message and calls sub2 which prints a message and calls sub3. Chapter 15 - Subroutines 139 ________________________ Sub3 prints a message and then returns to sub2 which prints a message and returns to sub1 which prints a message and returns to the entry routine which prints a message and then exits. This program should print 7 messages in all. You will notice that the first thing that each subroutine does is save the value in AX, since it uses the AX register. This is the cardinal rule of robustness at the assembler level. IF YOU USE A REGISTER, YOU MUST SAVE ITS VALUE BY PUSHING IT ON THE STACK; YOU MUST THEN RESTORE THE VALUE JUST BEFORE EXITING. It is impossible to overstress this. The routines which call your routine might rely on the registers remaining unaltered. If you disobey this rule and alter the registers, you'll be sorry. Why doesn't the entry routine push and pop the registers it uses? Well, the operating system assumes the registers will contain trash upon return from the program, so it uses nothing in the data registers. All the subroutines except the entry routine are near routines. We will only use near routines. Assemble this program, link it and run it. If it works ok, it is then time for program 2, which is the same as program1, but is in two files. Often, we want parts of a program in different files. Perhaps parts are standard subprograms which you have already written and assembled, perhaps the total program is too large to be handled comfortably in one file, perhaps different people are writing different parts of the program. Not only must we write the programs, but we must be able to connect them. We will put the entry routine, sub2 and the associated data in subtemp1.asm. We will put sub1, sub3, and the associated data in subtemp2.asm. Take a look at SUBTEMP2.ASM. It is slightly different. First, it does not have the variables that you need for set_reg_style (ax_byte, bx_byte, etc.) but it does have EXTRN statements for them. This means that you can change the register style from this file. SUBTEMP1.ASM has these variables declared PUBLIC so the linker can join them correctly.{1} We will talk about the correct way to declare external data later. SUBTEMP2.ASM has no stack segment, though there could be one. There is no entry subroutine. Therefore at the very end, you have the line: END with nothing after it. In SUBTEMP1.ASM, you have ____________________ 1. The reason for having only one set of variables for the style is so that every time you change one of the style variables, the array is updated. If you had two different arrays you could have two different sets of information for set_reg_style. The PC Assembler Tutor 140 ______________________ END start so the assembler and linker know that the program begins at the label "start". Let's do the two programs. Here are the data, the entry code and the subroutine code from the first file. ;prog1.asm ; + + + + + + + + + + + + + + + START DATA BELOW THIS LINE main_message db "This is the entry routine.", 0 sub2_message db "This is subroutine2.", 0 ; + + + + + + + + + + + + + + + END DATA ABOVE THIS LINE ; + + + + + + + + + + + + + + + START CODE BELOW THIS LINE PUBLIC sub2 EXTRN sub1:NEAR, sub3:NEAR mov ax, offset main_message call print_string call sub1 mov ax, offset main_message call print_string ; + + + + + + + + + + + + + + + END CODE ABOVE THIS LINE ; + + + + + + + + + + + + START SUBROUTINES BELOW THIS LINE sub2 proc near push ax mov ax, offset sub2_message call print_string call sub3 mov ax, offset sub2_message call print_string pop ax ret sub2 endp ; + + + + + + + + + + + + END SUBROUTINES ABOVE THIS LINE Notice that sub1 and sub3 have been declared EXTRN before they were referenced, and the EXTRN statement tells the assembler that they are both near procedures. sub2 has been declared PUBLIC so the assembler will give the address of sub2 to the linker. Here's the data and code for the other file. ;prog2.asm ; + + + + + + + + + + + + + + + START DATA BELOW THIS LINE sub1_message db "This is subroutine1.", 0 sub3_message db "This is subroutine 3.", 0 ; + + + + + + + + + + + + + + + END DATA ABOVE THIS LINE ; + + + + + + + + + + + + START SUBROUTINES BELOW THIS LINE Chapter 15 - Subroutines 141 ________________________ PUBLIC sub1, sub3 EXTRN sub2:NEAR ;------------ sub1 proc near push ax mov ax, offset sub1_message call print_string call sub2 mov ax, offset sub1_message call print_string pop ax ret sub1 endp ;------------ sub3 proc near push ax mov ax, offset sub3_message call print_string pop ax ret sub3 endp ; ---------- ; + + + + + + + + + + + + END SUBROUTINES ABOVE THIS LINE Here sub1 and sub3 have been declared PUBLIC and sub2 has been declared EXTRN. Assemble both programs and then link all three. link prog1+prog2+\asmhelp.obj assuming that asmhelp is in the root directory. Run it. You should have the same results as before. We are going to do one more thing with the same two files. Without changing any of the code, we are going to put the data for prog1 in prog2 and the data for prog2 in prog1 like this. ;prog1 ; + + + + + + + + + + + + + + + START DATA BELOW THIS LINE sub1_message db "This is subroutine1.", 0 sub3_message db "This is subroutine 3.", 0 ; + + + + + + + + + + + + + + + END DATA ABOVE THIS LINE ;prog2 ; + + + + + + + + + + + + + + + START DATA BELOW THIS LINE main_message db "This is the entry routine.", 0 sub2_message db "This is subroutine2.", 0 ; + + + + + + + + + + + + + + + END DATA ABOVE THIS LINE So far, so good. Obviously we are going to need some more PUBLIC The PC Assembler Tutor 142 ______________________ statements and some EXTRN statements so the linker can link the four messages, but where do they go and what do they look like? The PUBLIC statements are the easiest. Put them in the segment where the message data appears, either before or after the data declaration. The EXTRN statement is a little more complicated. First, all data is declared EXTRN by giving the variable name followed by a colon followed by its data type. The data types are BYTE, WORD, DWORD (4bytes), QWORD (quadword or 8 bytes), and TBYTE (10 bytes). These are the standard 8086/7 data sizes. Therefore we have: EXTRN sub1_message:BYTE, sub3_message:BYTE in prog2.asm and: EXTRN main_message:BYTE, sub2_message:BYTE in prog1.asm. Where do they go? In order to know that, we need to talk about segment overrides again. You will remember from our discussion of the ASSUME statement that every time the assembler writes code with a variable, it checks the ASSUME statements to see which segment register(s) have the address of the segment that that variable is in. If we have: ASSUME cs:SEG1, ds:SEG2, es:SEG3, ss:SEG4 then if variable1 is in SEG2, the assembler will write no override in the code since DS is the 8086 default segment. MACHINE CODE ASSEMBLER INSTRUCTION A1 0000 mov ax, variable1 If variable1 is in SEG1 or SEG3 or SEG4, the assembler will write the appropriate segment override in the code. MACHINE CODE ASSEMBLER INSTRUCTION 2E: A1 0000 mov ax, variable1 26: A1 0000 mov ax, variable1 36: A1 0000 mov ax, variable1 (By the way, those zeros just mean that the variable is at 0000 offset from the beginning of the segment). The same thing happens when you have an EXTRN statement. The assembler associates the externally declared variable with the segment it is declared in. When the variable is used, it then goes through the same actions as if the variable were actually in that segment. Let's declare variable5 external with: EXTRN variable5:WORD If we have: Chapter 15 - Subroutines 143 ________________________ ASSUME cs:SEG1, ds:SEG2, es:SEG3, ss:SEG4 then if variable5 is declared external in SEG2, the assembler will write no override in the code since DS is the 8086 default segment. MACHINE CODE ASSEMBLER INSTRUCTION A1 0000 E mov ax, variable5 If variable5 is declared external in SEG1 or SEG3 or SEG4, the assembler will write the appropriate segment override in the code. MACHINE CODE ASSEMBLER INSTRUCTION 2E: A1 0000 E mov ax, variable5 26: A1 0000 E mov ax, variable5 36: A1 0000 E mov ax, variable5 The "E" after the machine code means that the assembler knows that the variable is external and it will tell the linker so the linker can put the correct offset address at that point in the machine code. Remember, as always, that it is your responsibility to have the correct segment address in the segment register before using a variable. Now we know where it goes. When you declare a variable external, you must put the EXTRN statement in a segment which uses the same segment register as the EXTRN variable is going to use. If the EXTRN variable will use DS, then the segment where the EXTRN statement is must use DS. If the variable uses ES, then the segment the EXTRN statement is in must use ES. In other words, the ASSUME statement for the segment the variable is in must match EXACTLY the ASSUME statement you would write if the variable were internal, not external.{2} Normally, this is DS, but in special circumstances you might want something else. Also, if there is no segment that exactly matches what you want, then you need to create a dummy segment: DUMMY_SEG SEGMENT EXTRN variable7:QWORD DUMMY_SEG ENDS and make the assume statement that you want: ____________________ 2. This means that if the segment with the EXTRN statement has more than one segment register in the assume statement: ASSUME ds:MORESTUFF, es:MORESTUFF then both those registers must be set to the segment of the external variable when using it or your results may be unreliable. The PC Assembler Tutor 144 ______________________ ASSUME es:DUMMY_SEG What segment has DS in an ASSUME statement? DATASTUFF in both files, so that is where the EXTRN declaration goes - in the DATASTUFF segment. ;prog1 ; + + + + + + + + + + + + + + + START DATA BELOW THIS LINE PUBLIC sub1_message, sub3_message EXTRN main_message:BYTE, sub2_message:BYTE sub1_message db "This is subroutine1.", 0 sub3_message db "This is subroutine 3.", 0 ; + + + + + + + + + + + + + + + END DATA ABOVE THIS LINE ;prog2 ; + + + + + + + + + + + + + + + START DATA BELOW THIS LINE PUBLIC main_message, sub2_message EXTRN sub1_message:BYTE, sub3_message:BYTE main_message db "This is the entry routine.", 0 sub2_message db "This is subroutine2.", 0 ; + + + + + + + + + + + + + + + END DATA ABOVE THIS LINE Change the data in the two files, assemble them again and link them again: link prog1+prog2+\asmhelp.obj You should get the same results as before. We are now through with these programs. Make sure you understand how to define PUBLIC and EXTRN procedures and PUBLIC and EXTRN data before going on, since we are not going to cover it again. Everything else in this chapter will be done with a single file in order to make life easier. PASSING DATA When you pass data to the routines in ASMHELP.OBJ, you always pass it through the AX register. The reason for this is that you needed to use these routines before you knew much about 8086 assembler language. It is solely for the convenience of beginners and is totally non-standard. In the real world, when you call a subroutine you ALWAYS pass the data on the stack, no matter which language you are using. If you have the C statement: my_procedure (variable1, variable2, variable3) ; then the C compiler will generate the following code: push variable3 push variable2 push variable1 Chapter 15 - Subroutines 145 ________________________ call my_procedure{3} The C language pushes these variables in right to left order. Before the call instruction is executed variable1 is on the top of the stack, variable2 is the next down, and variable3 is third on the stack. Is variable1 still on the stack top after the call instruction is executed? No. The call instruction pushes either one or two words on the stack. Before you go any farther with subroutines you need to know how the call and return instruction operate. Every time you have used show_regs, both CS the code segment address and IP the instruction pointer have been displayed. What does IP do? When the 8086 is ready to execute an instruction, it takes IP, adds it to CS to calculate the total address, and gets the instruction at that address. It then immediately figures out how long the instruction is going to be and adds that amount to IP.{4} What this means is that at any time, IP points to the NEXT instruction, not the current instruction. When you execute a call, the 8086 changes IP to point to the first byte of the called subroutine, so the next instruction executed is the first byte of the called subroutine. There are two different types of procedures, near procedures and far procedures. In a near procedure, you keep CS, the code segment register, the same. In a far procedure you change CS. So, when you call a near procedure you change one thing (IP) and in a far procedure you change two things (IP and CS). When you want to get back from the subroutine, you need to have CS with the segment of the calling routine and IP with the address of the instruction after the call. What are the mechanics of all this? Let's take a near procedure first. In a near call, the 8086 first changes the instruction pointer to point to the next instruction. It then pushes IP on the stack, and puts the address of the called subroutine (which is in bytes 2 and 3 of the call instruction) in IP. IP now points to the called subroutine. There is one more word (2 bytes) on the stack. At the end of the called subroutine, a NEAR return (ret) pops the top word off the stack into IP. IP then points to the instruction after the call instruction. In a far call, the 8086 first changes the instruction pointer (IP) to point to the next instruction. It then pushes CS on the stack, followed by IP. It then loads the offset address of the called subroutine in IP and the segment address of the called subroutine in CS. This new IP is in bytes 2 and 3 of the call instruction and the new CS is in bytes 4 and 5 of the call ____________________ 3. You C fanatics will notice that there are some initial underscores missing. Let's not confuse the issue. 4. Instructions can vary from one byte long to six bytes long, and the 8086 can tell from the first (or first and second) byte(s) how long the total instruction will be. The PC Assembler Tutor 146 ______________________ instruction. IP and CS now have the address of the called subroutine. The stack has two words (4 bytes) more on the stack. The old IP is the stack top and the old CS is next on the stack. At the end of the subroutine, a FAR return (ret) pops the stack top into IP, then pops the next stack item into CS. Now IP and CS point to the instruction after the call instruction. These are two different types of call and they have two different machine codes. These are two different types of returns and they have two different machine codes. MACHINE CODE ASSEMBLER INSTRUCTIONS ; a far routine ;----- far_routine proc far CB ret far_routine endp ;----- ; a near routine ;----- near_routine proc near C3 ret near_routine endp ;----- ; a near and far call E8 0A43 R call near_routine 9A 015C ---- R call far_routine The machine code for a near return is C3; for a far return it's CB. The machine code for a near call is E8; for a far call it's 9A. The near call has the address of the called routine (0A43h) in the following two bytes. The far call has the address of the the called routine (015Ch) in the next two bytes followed by the segment of the called routine. The segment address isn't there yet. It will be put there by the linker and loader, but the assembler has saved the space for the address. That's why the dashes are there. Remember, the R is there because those addresses might be relocated by the linker or the loader. You tell the assembler whether to code a near return or far return by telling it whether it is a near or a far procedure. routine1 proc near routine2 proc far How does the assembler know whether to code a near or far call? If it has already seen the procedure, it knows what type it is. Chapter 15 - Subroutines 147 ________________________ If it hasn't seen it yet, it uses the default type.{5} If it is an external subroutine, the assembler knows because you have written an EXTRN statement. EXTRN routine3:NEAR, routine4:FAR This EXTRN statement should appear before the call. What if the routine appears after the call in the source file but it isn't the default type? You can override the default type. call NEAR PTR routine5 call FAR PTR routine6 This is the same cumbersome syntax that we had with pointers to data, but it's the only game in town. Normally, if the subroutine appears after the call, you don't need to do anything if it is a near call but you need to put a FAR PTR override if it is a far call. ____________________ 5. The default is near for what we are doing. However, Microsoft has something called "simplified" directives and the default changes in these cases.